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ABSTRACT 

This paper reports on a survey of the perceptions of 86 
state-level educational assessment personnel or other professionals involved 
in educational assessment regarding different ways in which test scores are 
influenced by accommodations for participants with disabilities and how 
scores obtained under accommodated conditions should be treated in reporting. 
It notes three- general options: (1) report all scores in the aggregate (i.e., 

do not differentiate between accommodated and non-accommodated test scores; 

(2) report accommodated scores separately; and (3) report accommodated scores 
both in the aggregate as well as separately. A survey instrument asked 
participants to classify accommodations into one of three categories 
distinguished by the degree to which accommodations influence performance. 
Accommodations were also classified as either presentation accommodations, 
response accommodations, setting accommodations, or timing accommodations. 

The study found a relatively high degree of disagreement among participants 
on the degree to which various accommodations alter test constructs and thus 
should be treated differently. Accommodations seen as least likely to 
influence performance were small group administration, use of a large print 
answer sheet, special lighting, or flexible scheduling. Accommodations seen 
as most likely to influence performance included provision of extra time, 
test administration at home, use of a calculator for computations, or use of 
a spell checker. The paper suggests that accommodations be viewed as a means 
of establishing optimal testing conditions rather than as a means of 
"leveling the playing field. " Appendices include the survey protocol and 
detail on level of agreement for each accommodation. (Contains 12 references, 
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Executive Summary 

Policies intended to increase the participation of students with disabilities in state and local as- 
sessment systems have been in full force for several years. Test accommodations constitute the 
most frequently used alternative to increase their participation rates. Because accommodations 
continue to be so widely applied despite the limited amount of empirical research available 
demonstrating how they affect test scores, it is necessary that sound, rational decisions be made 
about the use of accommodated test scores. 

One challenge confronting state education agencies is to determine the most appropriate way 
to report the test scores of those students receiving accommodations. There are three general 
options: 

1. Report all scores in the aggregate (i.e., do not differentiate between accommodated and non- 
accommodated test scores) 

2. Report accommodated scores separately 

3. Report accommodated scores both in the aggregate as well as separately 

Each option reflects different beliefs about how accommodations influence test scores. 

The future of accommodations research depends, in part, on the perceived need for the research 
as well as continued availability of the resources to conduct such research. The opinions of the 
stakeholders, particularly those influencing policy on how to report scores from accommodated 
tests, may provide a barometer of the perceived need for further research. 

The present study is a survey of the perceptions held by people familiar with policy or research 
on the way in which test scores are influenced by accommodations and how scores obtained 
under accommodated conditions are to be treated in reporting. The results show that the extent 
of agreement about how accommodated scores should be treated depends on the accommoda- 
tion. The study also shows how deep-seated beliefs lead some respondents to consider almost 
no accommodation as changing the construct, whereas other respondents consider almost all 
accommodations as influencing the construct being measured. 
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Overview 






Policies intended to increase the participation of students with disabilities in state and local as- 
sessment systems have been in full force for several years. Test accommodations constitute the 
most frequently used alternative to increase participation rates of these students. The widespread 
use of test accommodations has spawned a flurry of empirical studies to explore questions such 
as what accommodations should be used and with whom. It will be some time before answers 
to these questions are sufficiently refined to enable strong conclusions about the benefits and 
drawbacks of test accommodations. In the meantime the use of accommodations flourishes 
(American Council on Education, 2002; Thompson & Thurlow, 2001). Because accommoda- 
tions continue to be so widely applied despite the limited amount of empirical research available 
demonstrating how they affect test scores (Thompson, Blount, & Thurlow, 2002), it is necessary 
that sound, rational decisions be made about the usability of accommodated test scores. 

One challenge confronting state education agencies is to determine the most appropriate way of 
reporting test scores for students receiving accommodations. There are three general options: 

1. Report all score in the aggregate (i.e., do not differentiate between accommodated and non- 
accommodated test scores) 

2. Report accommodated scores separately 

3. Report accommodated scores both in the aggregate as well as separately 

Each option reflects different beliefs about how accommodations influence test scores. Option 1 
implies that the accommodated test scores measure the same construct in the same way as non- 
accommodated test scores. Option 2 implies that the accommodation changes the meaning of 
the test score, therefore the scores must be considered separately. Option 3 implies uncertainty 
about how the accommodation influences test scores, if at all; the third option reflects the reality 
of test accommodations research - we simply do not have definitive evidence about how each 
accommodation or combination of accommodations influences test scores. Options 1 and 2 may 
represent personal biases more than definitive empirical evidence. 

There is some evidence that the opinions of people who are familiar with this issue vary, even 
to the point that the opinions are in direct opposition. State guidelines about how to report 
accommodated test scores provide evidence of this. The lists of approved and non-approved 
accommodations in each state show that an accommodation that is approved in one state may 
not be approved in another state, even when the same assessment is used (Thurlow, House, 
Boys, Scott, Ysseldyke, 2000; Thurlow, Thompson, Lazarus, & Robey, 2002). Federal regula- 
tions that require states to report test scores for students with disabilities in the aggregate and 
separately, in combination with the high stakes placed on the scores, makes this reporting issue 
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particularly salient. Furthermore, the conflict between what measurement theory regards as es- 
sential for test score comparisons and the provision that any-and-all accommodations must be 
made available to students with a disability compounds the issue (Heumann & Warlick, 2000), 
raising tensions and uncertainty. 

The future of accommodations research depends, in part, on the perceived need for the research 
as well as continued availability of the resources to conduct such research. The opinions of the 
stakeholders, particularly those influencing policy on how to report scores from accommodated 
tests, may provide a barometer of the perceived need for further research. For instance, one 
would expect little perceived need if all of the people influencing policy decisions shared the 
same opinion on how to treat test scores obtained under non-standard conditions. On the other 
hand, if the opinions of this stakeholder group varied, then a need for more research, or at least 
more discussion of the issues, would be indicated. The extent of need for more research likely 
varies by the type of accommodation. Those accommodations on which there is little agreement 
about perceived effects on test score interpretation deserve most of our attention. 

The present study is a survey of the perceptions held by people familiar with policy or research 
on the way in which test scores are influenced by accommodations and how scores obtained 
under accommodated conditions are to be treated. Rather than asking participants directly 
about how they believe an accommodation influences test score interpretation, the study asked 
participants to classify accommodations into one of three categories that can be distinguished 
by the degree to which accommodations influence performance using a classification scheme 
developed by CTB/McGraw-Hill. 



Instrument 

In an effort to create guidelines for using test results from standardized tests administered under 
non-standard conditions, CTB/McGraw-Hill created a framework for classifying accommoda- 
tions (CTB/McGraw-Hill, 2000). Accommodations were framed according to their expected 
influence on student performance, and then according to how the results should be reported. 
Category 1 accommodations are not expected to influence test performance in a way that would 
alter the characteristics of the test. According to the CTB/McGraw-Hill document, test scores 
for students receiving such accommodations should be interpreted as test scores from standard 
administrations, and these scores should be aggregated with the scores of standard administra- 
tions. According to CTB/McGraw-Hill, Category 2 accommodations are expected to have some 
influence on test performance, but should not alter the construct the test was designed to measure. 
Category 2 accommodations may boost test performance; therefore, the type of accommodation 
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used should be considered when interpreting the test scores. Scores obtained under Category 
2 accommodations can be aggregated with scores obtained under standard conditions, but the 
scores should also be reported separately and the number and percent of students using such 
accommodations should be clearly indicated along with summary statistics. Category 3 accom- 
modations, as classified by CTB/McGraw-Hill, are expected to alter the construct that the test 
was designed to measure. In the absence of research demonstrating otherwise, scores obtained 
under Category 3 accommodations should be interpreted in light of how the accommodation is 
thought to influence performance. Some of the Category 3 accommodations are content specific, 
for example receiving the read-aloud accommodation on a reading test, or using a calculator 
on math computation items. Score interpretation should consider the accommodation-content 
combination and whether the accommodation changes what the tests were designed to measure. 
According to CTB/McGraw-Hill, scores from Category 3 accommodations should be reported 
in aggregated and disaggregated forms, and the number and percent of students using such ac- 
commodations should be clearly indicated along with summary statistics. 

Using the three categories of accommodations, a survey was created in which participants were 
asked to assign each of 44 accommodations to one of the three categories. The categories were 
designed to be mutually exclusive, but they might not have been exhaustive. 

Participants 

Participants chosen for this study were familiar with accommodations research or state poli- 
cies on the use of test accommodations. A survey was sent to each of the 50 state assessment 
directors, each state special education director, and to individuals who have presented research 
on test accommodations or have published accommodations research. One hundred and thirty 
surveys were mailed initially, and also re-mailed to those who had not responded to the first 
mailing. In all, we obtained responses from 86 individuals (66% of those sent). Of these, 63 
(73%) provided a single rating for each accommodation, and 77 (89%) provided a single rating 
for at least 40 of the 44 accommodations. 

Of the 86 respondents, 60 were state department of education personnel, either assessment 
directors or special education directors. Eleven respondents were involved in accommodations 
research or in drafting policy guidelines on the use of accommodations. The other 1 1 respondents 
were practitioners or described themselves by checking multiple categories. 



Accommodations 

The accommodations used in this survey were chosen to be representative of the accommo- 
dations used in practice. This list of 44 accommodations was not meant to be exhaustive. A 



ERIC 



JCEO 



popular classification scheme was used to cluster the accommodations and to ensure that these 
accommodations represented different aspects of test administration that could be accommo- 
dated (Thurlow, Ysseldyke, & Silverstein, 1993). The four categories were: (1) presentation, 
(2) response, (3) setting, and (4) timing. According to this scheme, accommodations can be 
distinguished by that aspect of the standard administration that is altered by the accommodation. 
For instance, presentation accommodations represent accommodations that alter the standard 
presentation of the test - presenting test material in Braille is a common example of a presenta- 
tion accommodation. Response accommodations alter the way in which examinees respond to 
test items - marking the answer in the test booklet as opposed to a bubble sheet would be an 
example of a response accommodation. Setting accommodations usually refer to changes in 
the typical size of the group to which the test is administered or the location the test is taken 
- taking the test in a small group is an example of a setting accommodation. Timing accommo- 
dations typically refer to allowing the examinee extra time to complete the test. There were 20 
presentation accommodations, 14 response accommodations, 5 setting accommodations, and 5 
timing accommodations (see Appendix A). 



The categories into which respondents placed each of the 44 accommodations in the CTB/ 
McGraw-Hill fist were examined by creating frequency distributions. These were plotted as 
bar graphs according to the four category classification scheme (presentation, response, setting, 
timing). 



Presentation Accommodations 

Figure 1 displays the results for the 20 presentation accommodations. As is evident in the bar 
graph, there was little variability in the classification of the first four presentation accommoda- 
tions (visual magnification, large print, audio amplification, and place markers); most of the 
respondents (over 90%) chose Category 1 for these four accommodations. The next four ac- 
commodations all represent ways of presenting test directions (read aloud, audio, signed, and 
highlighted). At least 70% of the respondents also chose Category 1 for these accommodations. 
The CTB/McGraw-Hill category chosen by respondents for the remaining presentation ac- 
commodations varied much more. Most respondents classified items read aloud, audio items, 
communication device, and computer presentation into either Category 1 or Category 2. More 
than 50% of the respondents classified accommodations representing oral presentation of the 
reading test and providing a calculator for a math computation test as Category 3. 
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Figure 1. Frequency Distribution of Category Ratings for Presentation Accommodations 




Response Accommodations 

Figure 2 displays the results for the 14 response accommodations. More than 85% of the re- 
spondents placed responding in the test booklet, large print, using a template, and using graph 
paper into Category 1 . There was little agreement as how to treat the response accommodations 
like scribes and spell checkers used when spelling was not scored. The use of a spell checker 
when spelling was scored was more consistently categorized by respondents into Category 3. 
Still, some respondents did place this accommodation into Category 2, and some placed it in 
Category 1. 



Setting Accommodations 



Figure 3 displays the distribution of responses to the setting accommodations. Respondents almost 
unanimously agreed that taking a test alone or in a small group, or the use of adaptive furniture 
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Figure 2. Frequency Distribution of Category Ratings for Response Accommodations 
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or special lighting or acoustics did not alter the meaning of the score, and therefore could be 
placed in Category 1, where scores should simply be reported in the aggregate as though they 
are standard scores. The accommodation of taking the test at home or in a care facility received 
less consistent placement into Category 1. Roughly one-third of the respondents placed this 
accommodation into Category 2. 

Timing Accommodations 

Figure 4 displays the distribution of responses to the timing accommodations. Respondents 
unanimously agreed that the first two accommodations (additional breaks and flexible schedul- 
ing) do not alter the meaning of test scores and therefore belong in Category 1. These represent 
scheduling accommodations that do not result in extra testing time. The timing accommodation 
of taking the test over several days, but still not resulting in extra time, showed greater variability 
in responses. About 55% of the respondents placed this accommodation into Category 1, 35% 
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Figure 3. Frequency Distribution of Category Ratings for Setting Accommodations 




Figure 4. Frequency Distribution of Category Ratings for Timing Accommodations 
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placed it into Category 2, and 15% placed it into Category 3. The majority of respondents placed 
the timing accommodations of extra time on a timed test and extra breaks on a timed test into 
either Category 2 or Category 3. 



Agreement Among Respondents 

Table 1 is a count of the accommodations by level of agreement. Low agreement was defined 
as less than 50% of the respondents placing the accommodation into the same category, moder- 
ate agreement was defined as 50 to 89% of respondents choosing the same category, and high 
agreement was defined as 90% or more choosing a particular category for an accommodation. 
There was low agreement on 14 of the accommodations, moderate agreement on 17, and high 
agreement on 13 of the accommodations. Specific accommodations by level of agreement 
among participants can be found in Appendix B. 



Table I.The Number and Percent of Accommodations by Level of Agreement 



Agreement 


Number 


Percent 


Low 


14 


32 


Moderate 


17 


39 


High 


13 


29 



Table 2 is a list of the accommodations in which more than 90% of the respondents indicated 
that the accommodation belonged in Category 1. Included in this list are three presentation ac- 
commodations, three response accommodations, and four setting accommodations. None of the 
timing accommodations were agreed upon by 90% of respondents as belonging to Category 1. 



Individual Bias 
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The degree to which some individuals favor accommodations regardless of the type of ac- 
commodation was analyzed by examining categorizations of accommodations that alter some 
feature of the test directly involved in test performance and that therefore might be expected 
to change the construct of the test. For instance, oral presentation during a reading test is per- 
ceived by some respondents to alter what the test was intended to measure, reading skills. Six 
accommodations of this type were identified (see Table 3) and the percentage of respondents 
choosing Categories 1 or 2 calculated. Twenty-five percent of the respondents indicated that 
the scores obtained from using a calculator on a computation test should be treated as Category 
1 or Category 2, and 58% indicated that scores obtained with extra time on a timed test should 
be treated as Category 1 or Category 2. 
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Table 2. Accommodations That Respondents Unanimously Agreed Belong in Category 1 



Accommodation 


Percent 
Category 1 


Presentation 




Magnifying equipment 


95 


Large-print 


96 


Audio amplification 


92 


Response 




Maintain place 


98 


Mark responses in test booklet 


93 


Mark responses on large-print answer document 


95 


Setting 




Take test alone 


92 


Take test in small group 


93 


Use adaptive furniture 


93 


Use special lighting 


94 



Table 3. Percent of Respondents Assigning Category 1 or 2 to Accommodations That Alter a 
Feature of the Test Critical to Performance 





Percent Category 
1 or 2 


Use text-talk converter on a reading test 


32 


Have stimulus material read on a reading test 


31 


Have stimulus material paraphrased 


33 


Use calculator on a mathematics test 


25 


Use spell checker on a writing test 


38 


Use extra time on a timed test 


58 



Discussion 

The findings in this study point to the need for further dialogue and more research on test 
accommodations. The opinions of those who influence policy and who are familiar with test 
accommodations vary too much to ignore. When one group believes that an accommodation 
alters the construct and thus should be treated differently, while another group believes that the 
accommodation maintains the integrity of the scores, there is a need for further discussion. How- 
ever, it is unlikely that discussion without empirical evidence will lead to greater agreement. 

Even empirical evidence may not be enough to sway opinion. This survey seems to verify that 
beliefs about how to treat accommodated scores run deep. The fact that nearly everyone believes 
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that the accommodations listed in Category 1 do not affect test scores in a way that would alter 
the meaning of the scores suggests that the field should not devote precious resources to further 
empirical investigation of these. 

Although there does not appear to be a single theme underlying this list of accommodations, 
it would appear that several of the accommodations were intended primarily for students with 
either a physical or a sensory disability. It is not surprising to find accommodations meant for 
students with physical and sensory disabilities on this list. The distinction between the disability 
and the purpose of the assessment is clear for students with physical and sensory disabilities. 
However, as Phillips ( 1 994) pointed out, this distinction is not so clear for students with learning 
disabilities. The idea of accommodating students with physical disabilities resonates so well with 
so many of the people familiar with test accommodations that it is often used as a metaphor to 
illustrate the purpose of test accommodations for students with other disabilities. For example, 
Elliott, Kratochwill, McKevitt, Schulte, Marquart, and Mroch (1999) use the metaphor of an 
access ramp to illustrate how test accommodations work. Without an access ramp a student us- 
ing a wheel chair would not be able to “access” the test. They argue that accommodations are a 
means to reduce the barrier of access skills. Access skills refer to the test-taking skills required 
to demonstrate what one knows and can do (e.g., attention and the ability to read). Presum- 
ably access skills, although necessary, are incidental to the construct the test was designed to 
measure. However, even the notion of access skills becomes murky for many accommodations. 
For instance, should reading math word problems be considered incidental to the construct of 
math problem solving? 

Another accommodation that received almost unanimous assignment to Category 1 is small 
group administration. An accommodation is defined as an alteration to standard test adminis- 
tration. Each of the essential aspects of standard administration should be described in the test 
procedures manual. Furthermore, one would assume that all procedures essential to standard 
administration are in place during the field-testing. One may wonder whether group size is 
defined in the test procedures manuals, and whether a uniform sized group is used at every site 
in the field test. If the two preceding conditions are not met, one could argue that small group 
administration does not constitute a testing accommodation. The decision to treat small-group 
administration as an accommodation is particularly important because it is one of the most 
frequently used accommodations. 



The extent of the variability in the respondents’ perceptions to this survey may simply reflect the 
differences in the opinions researchers have regarding the way in which the effectiveness of an 
accommodation is demonstrated. Much of the recent accommodations research has dealt with 
the extent to which an accommodation boosts test performance (Elliott, Kratochwill, McKevitt, 
Schulte, Marquart, & Mroch, 1999; Fuchs, Fuchs, Eaton, Hamlett, & Kams, 2000; Thompson, 
Blount, & Thurlow, 2002; Tindal, Helwig, & Hollenbeck, 1999). Although a performance 
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boost may be necessary to conclude that an accommodation was effective, it is not sufficient to 
conclude that the accommodation was valid. Overemphasizing a test score boost may have led 
people less familiar with measurement theory to conclude that an accommodation is valid if it 
boosts performance. It might also explain why IEP teams tend to over-accommodate; DEP teams 
may try any accommodation that may boost performance. More research is needed to examine 
whether accommodated tests alter the validity of the scores. Furthermore, accommodations 
research on performance boost should always acknowledge that a boost does not imply that the 
accommodated scores are a valid measure of the construct. 

The variability in the perceptions about how accommodated scores should be treated may be 
due in part to the lack of a sound measurement model for accommodations. The justification 
for accommodations is based largely on the belief that accommodations level the playing-field. 
What does it mean to level the playing-field? For an accommodation to level the playing-field, 
it must be assumed that standard testing conditions impinge on the performance of students 
with disabilities. Performance here is considered in the maximal sense. Test score theory posits 
a “true” score, which is defined as the average performance over repeated testing with the same 
pool of items under the same conditions. Flowever, accommodations change those conditions; 
therefore, the notion of true score no longer applies. It is beyond the scope of this paper to 
introduce a new theoretical conceptualization of test accommodations, but it suffices to say 
that it may be more logical to view accommodations as a means of establishing optimal testing 
conditions. Regardless of precisely how accommodations are perceived, there is a need for ap- 
plying a testable measurement model to this concept. 
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Appendix A*— 

Survey Protocol 



Presentation Accommodations 


Cl 


C2 


C3 


1. Use visual magnifying equipment 








2. Use a large-print edition of the test 








3. Use audio amplification equipment 








4. Use markers to maintain place 








5. Have directions read aloud 








6. Use a tape recording of directions 








7. Have directions presented through sign language 








8. Use directions that have been marked with highlighting 








9. Have stimulus material, questions, and/or answer choices read aloud, 
except for a reading comprehension test 








10. Use a tape-recorder for stimulus material, questions, and/or answer 
choices, except for a reading comprehension test 








11 . Communication devices, (e.g., test-talk converter), except for a 
reading comprehension test 








12. Have computer presentation of text that is not otherwise available for 
computer presentation 








13. Use a calculator or arithmetic tables, except for a mathematics 
computation test 








14. Use Braille or other tactile form of print 








15. On a reading comprehension test, have stimulus material, questions, 
and/or answer choices presented through Sign Language 








16. On a reading comprehension test, use a text-talk converter 








17. On a reading comprehension test, use a tape recording of stimulus 
material, questions, and/or answer choices 








18. Have directions, stimulus material, questions, and/or answer choices 
paraphrased 








19. For mathematics computation test, use a calculator or arithmetic 
tables 








20. Use a dictionary 








Response Accommodations 


Cl 


C2 


C3 


1 . Mark responses in test booklet 








2. Mark responses on large-print answer document 








3. For selected-response items, indicate responses to a scribe 








4. Record responses on audio tape, except for constructed-response 
writing tests 








5. For selected-response items, use sign language to indicate response 
except for constructed-response writing tests 








6. Use a computer, typewriter, Braille writer, or other machine (e.g., 
communication board) to respond 








7. Use template to maintain place for responding 








8. Indicate response with other communication devices (e.g., speech 
synthesizer) 








9. Use graph paper to align work 









20 



15 



Response Accommodations 


Cl 


C2 


C3 


10. Use spelling checker except with a test for which spelling will be 
scored 








11 . For constructed response items, dictate responses to a scribe 








12. For a test for which writing will be scored, use a spelling checker 








13. For a test for which writing will be scored, respond on a word 
processor without a spelling checker 








14. Use a dictionary 








Setting Accommodations 


Cl 


C2 


C3 


1 . Take the test alone or in a study carrel with supervision 








2. Take the test with a small group 








3. Take the test at home or in a care facility with supervision 








4. Use adaptive furniture 








5. Use special lighting and/or acoustics 








Timing/Scheduling Accommodations 


Cl 


C2 


C3 


1 . Take additional supervised breaks that do not result in extra time 








2. Have flexible scheduling (e.g., test at a particular time of day) 








3. Take test across multiple-days (without resulting in extra time) for a 
test designed to be taken on a single day 








4. Use extra time for a timed test 








5. Take additional supervised breaks that result in extra time for any 
timed test 
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Accommodation by Level of Agreement Among Participants 
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